Numpy is the basic Python module for scientific computing in Python. Its most used object is the multidimensional array. These objects can have any number of dimensions with an efficient storage in the computer's RAM which makes data easy to handle and pass to other libraries. Furthermore, most ot numpy is implemented in C which makes it efficient and fast.
This is how numpy
is usually imported and used to generate an numpy array
In [ ]:
import numpy as np
In [ ]:
data = [1, 10 , 2, 3, 8.0] # data is a list
a = np.array(data) # a is now a numpy array
In [ ]:
type(a)
In [ ]:
a
This gives the shape of the array
In [ ]:
a.shape
the number of dimensions
In [ ]:
a.ndim
the number of elements
In [ ]:
a.size
the number of bytes
In [ ]:
a.nbytes
The attribute dtype
describes the element data type
In [ ]:
a.dtype
Arrays can be created with nested lists
In [ ]:
data = [[0.0, 2.0, 4.0, 6.0], [1.0, 3.0, 5.0, 7.0]]
b = np.array(data)
In [ ]:
b
In [ ]:
b.shape, b.ndim, b.size, b.nbytes
The function arange
is similar to range
but it creates an array and not a list
In [ ]:
c = np.arange(10)
c
the function linspace
allows for the creation of equally spaced points
In [ ]:
e = np.linspace(0.0, 10, 21) # 11 points
e
Similar to matlab, there are also functions like empty
, zeros
and ones
.
In [ ]:
np.empty((4,4))
In [ ]:
np.zeros((3,3))
In [ ]:
np.ones((3,3))
dtype
(for data type) is the attribute with the data type for each element.
This data type is usually implicit but can be enforced at the moment of creating the array
For instance, this is implicitly defined as an integer dtype
In [ ]:
a = np.array([0, 1, 2, 3])
In [ ]:
a, a.dtype
But you could force the creation of a complex array
In [ ]:
b = np.zeros((2,2), dtype=np.complex64)
b
or a float array
In [ ]:
c = np.arange(0, 10, 2, dtype=np.float)
c
Mathematical operations can be performed over the whole array without running a for
loop.
For instance
In [ ]:
a = np.linspace(0.0, 10.0, 5)
print('a =', a)
b = np.ones(5)
print('b =',b)
In [ ]:
a * 2 # every element in the array is multiplied by 2
In [ ]:
a + b #addition works element by element. The same goes for every operation
Slicing also works on arrays, only that this time it can be multidimensional
In [ ]:
a = np.random.rand(5, 5)#this creates a two dimensional array of random numbers
In [ ]:
print(a)
Each dimension has its own index
In [ ]:
print(a[0,0], a[0,1]) # first index corresponds to file, the second to columns
to extract the values of a whole column the following syntax can be used
In [ ]:
a[:,0] # this is the first column
The last row could be extracted as follows
In [ ]:
a[-1,:] #this is the last row
slicing also works in ranges
In [ ]:
a[0:2,0:3]
assignation also works with slicing
In [ ]:
a[0:2,0:3] = -4.0
In [ ]:
a
Arrays can be indexed using other boolean arrays.
For instance consider these two arrays with the age and gender of a set of 10 people
In [ ]:
age = np.array([23, 56, 67, 89, 23, 56, 27, 12, 2, 72])
gender= np.array(['m', 'o', 'f', 'f', 'm', 'f', 'm', 'o' ,'m', 'o'])
Suppose that we want to select only the gender of people marked as 'o'
(other).
The following statement gives the new boolean array. Each element tells me whether the condition is True or False
In [ ]:
ii = (gender == 'o')
print(ii)
Now if we want to have the ages of the people with gender o
all I have to do is:
In [ ]:
age[ii]
This logic can be extended to different conditions, for instance, let's select the items with age larger than 10 and smaller than 50
In [ ]:
ii = (age > 10) & (age < 50) # & is the symbol for the logical AND
print(age[ii])
print(gender[ii])
The following is also a valid syntax
In [ ]:
age[age>30]
Using a=np.random.normal(size=1000)
generate an array of 1000 thousand random numbers generated from a normal (i.e. gaussian) distribution with mean zero and standard deviation of one.
Print the number of elements with values larger than 2.0
. Is this number close to what you expected from the properties of a gaussian distribution?
Universal functions (or ufuncs
) are functions that take arrays as inputs and return either arrays or scalar. They are characterized for being fast (implemented in C) and allowing to write simpler python code without using for
loops.
Here is a list of all universal functions in numpy
For instance one could generate an array of values
In [ ]:
t = np.linspace(0.0, np.pi, 10)
print(t)
and the compute the values of the sin
function
In [ ]:
print(np.sin(t))
Using a=np.random.normal(size=1000)
generate an array of 1000 thousand random numbers generated from a normal (i.e. gaussian) distribution with mean zero and standard deviation of one.
Then using only ufuncs
on a
generate a new array b
that is -1
wherever a
is negative and 1
wherever a
is positive.
In [ ]: